Data Source

Abstract

The novel Coronavirus became a global pandemic in early 2020 and governments worldwide were tasked with understanding how to emerge from the pandemic. Understandably, there was a large focus on vaccines and this study will investigate the effectiveness of these vaccines with regards to reducing severity of COVID and the transmission of cases. The objective of this study is to better understand how, throughout 2021, the vaccines effected the severity of COVID and the transmission of cases. The data was collected by Our World in Data in partnership with Oxford University and covers countries from each continent around the world. The responsibility of the reporting of the data fell to the individual nations.

Objectives

Analyse the following hypothesis:

Define Variables for the study

Independent variables (Standardized x hundred)

Dependent variables (Standardized x hundred)

Latent/control variables (Standardized x hundred)

Understanding the variables

                   n                               
 [1,] "CONT"       "continent"                     
 [2,] "COUN"       "location"                      
 [3,] "CASES.T"    "total_cases_per_million"       
 [4,] "CASES.N"    "new_cases_per_million"         
 [5,] "DEATHS.T"   "total_deaths_per_million"      
 [6,] "DEATHS.N"   "new_deaths_per_million"        
 [7,] "ICU"        "icu_patients_per_million"      
 [8,] "HOSP"       "hosp_patients_per_million"     
 [9,] "VAC.T"      "total_vaccinations"            
[10,] "PEOPLE.V"   "people_vaccinated"             
[11,] "VAC.p"      "total_vaccinations_per_hundred"
[12,] "PEOPLE.V.p" "people_vaccinated_per_hundred" 
[13,] "POP"        "population"                    
[14,] "GDP.PC"     "gdp_per_capita"                
[15,] "HEART.R"    "cardiovasc_death_rate"         
[16,] "BEDS"       "hospital_beds_per_thousand"    
[17,] "HDI"        "human_development_index"       
[18,] "DATE"       "date.1"                        

Firstly, the vaccinations appear to have little effect on the number of cases. With every increase in vaccination per 100 people, there is an increase of 0.55 cases. This suggests that the presence of a vaccine has encouraged societies to take less care and fast-tracked governments relaxation of the rules.

Secondly, the reduction of ICU patients when vaccinations per hundred people increases is clear. With every increase in vaccination per 100 people there is a 0.24 decrease in the number of ICU patients per million. This shows the vaccine is reducing the severity of COVID much more than the transmission of cases.

Furthermore, the number of new deaths per million decreases with an increase in the number of vaccinations per 100 people by 0.08. Strengthening the idea that severity of COVID and, as a bi-product, deaths will be reduce as vaccine take up increased.

From this we can settle on the following hypothesis:

H1 - An increase in the number of people vaccinated will reduce the number of deaths

H2 - An increase in the number of people vaccinated will reduce the severity of COVID

H3 - An increase in the number of people vaccinated will not effect the transmission of cases.

Plots

The above graph explains how the number of deaths acted across the period between May 2020 and January 2022 in Spain. It should be made clear that this is the number of New Deaths, not accumulative deaths, which explains the volatile nature of the data.

Regressions


Call:
lm(formula = d$DEATHS.N ~ d$CASES.N + d$PEOPLE.V.p)

Residuals:
    Min      1Q  Median      3Q     Max 
-11.525   0.011   1.194   2.031  14.468 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -3.454575   0.032097 -107.63   <2e-16 ***
d$CASES.N     0.606850   0.004508  134.62   <2e-16 ***
d$PEOPLE.V.p -0.201922   0.008634  -23.39   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.528 on 40961 degrees of freedom
  (122595 observations deleted due to missingness)
Multiple R-squared:  0.3071,    Adjusted R-squared:  0.3071 
F-statistic:  9079 on 2 and 40961 DF,  p-value: < 2.2e-16

Call:
lm(formula = d$DEATHS.N ~ d$PEOPLE.V.p + factor(d$CONT))

Residuals:
    Min      1Q  Median      3Q     Max 
-10.068  -1.806   1.370   2.878  11.842 

Coefficients:
                            Estimate Std. Error t value Pr(>|t|)    
(Intercept)                  0.10401    0.06087   1.709   0.0875 .  
d$PEOPLE.V.p                -0.10618    0.01016 -10.452  < 2e-16 ***
factor(d$CONT)Africa        -3.87812    0.08481 -45.729  < 2e-16 ***
factor(d$CONT)Asia          -2.38576    0.07030 -33.939  < 2e-16 ***
factor(d$CONT)Europe        -1.33249    0.06743 -19.762  < 2e-16 ***
factor(d$CONT)North America -2.18187    0.08074 -27.025  < 2e-16 ***
factor(d$CONT)Oceania       -6.29534    0.14311 -43.989  < 2e-16 ***
factor(d$CONT)South America  0.43593    0.08879   4.910 9.16e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.011 on 40989 degrees of freedom
  (122562 observations deleted due to missingness)
Multiple R-squared:  0.1045,    Adjusted R-squared:  0.1044 
F-statistic: 683.4 on 7 and 40989 DF,  p-value: < 2.2e-16

Based on the above regression, where deaths are standardized per million, we have evidence to support hypothesis one. It would be expected that an increase in the number of cases would explain an increase in the number of deaths. However, as the number of vaccinations per hundred increases, it would be expected that the number of deaths should decrease. Here, we can interpret that for every increase in vaccinations per hundred a country gives, there will be a decrease of 0.023 new deaths per million, supporting the notion that an increase in the number of vaccinations there are will represent a decrease in the number of deaths.

We can also understand further that the effect of new deaths is relatively comparable across all CONTs, which leads us to believe there is less merit in exploring this relationship.


Call:
lm(formula = ES$DEATHS.N ~ ES$CASES.N + ES$HOSP + ES$ICU + ES$PEOPLE.V.p)

Residuals:
    Min      1Q  Median      3Q     Max 
-9.5238 -0.3988  0.1446  0.4841  1.6498 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)   -2.414659   0.524614  -4.603 6.32e-06 ***
ES$CASES.N     0.653489   0.009143  71.475  < 2e-16 ***
ES$HOSP       -0.872072   0.185441  -4.703 4.03e-06 ***
ES$ICU         1.277205   0.258540   4.940 1.34e-06 ***
ES$PEOPLE.V.p -0.226833   0.056441  -4.019 7.51e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.9231 on 281 degrees of freedom
  (466 observations deleted due to missingness)
Multiple R-squared:  0.9513,    Adjusted R-squared:  0.9506 
F-statistic:  1371 on 4 and 281 DF,  p-value: < 2.2e-16

The regression above makes interesting reading. The number of people vaccinated per hundred in Spain is an extension of the previous analysis we witnessed in the Pairs Panel above, as vaccinations increase the number of deaths decrease. Interestingly, the total number of vaccinations per hundred people has increased the number of ICU patients per million. It is likely that this is a reflection on the booster doses that have been issued because they are unlikely to reduce the number of ICU patients per million, but instead maintain the level that the initial rounds of vaccine achieved.

par(mfrow = c(1,2))
plot(ES$ICU ~ ES$PEOPLE.V.p + ES$HOSP)

    chisq        df    pvalue     rmsea 
2333.0385    7.0000    0.0000    1.0779 

   chisq       df   pvalue    rmsea 
626.0573   4.0000   0.0000   0.7374 

As previously explained, there is a negative correlation between people vaccinated per hundred and the total number of ICU patients. This is best depicted in the graphs above. At the point of 0 vaccinations per hundred, there is close to 100 ICU patients per million. It should be noted that there were reports of Spanish hospitals being at capacity throughout the pandemic which suggests, without capacity constraints, these numbers may have been higher. As the number of people vaccinated increased, the number of ICU patients decreased.

The graphs remain relatively volatile throughout the range of number of people vaccinated per hundred which can be interpreted to represent the evolving nature of the virus. The number of people vaccinated could be used as a proxy for time, as this number is only capable of increasing. Therefore the peaks on the graphs above are likely to represent the waves of the pandemic.

Model proposed by Albert

For this model 4 latent variables were defined, Cases, Vaccination, Hospitalization and Deaths, all of them with measure indicators of the daily average per million on each quarter of: cases, vaccines, hospitalizations and deaths.

The bivariate analysis shows correlations among the variables, initially there is not multidisciplinary detected meaning there is variation between variables and the periods.

 [1] "Country"      "GDPQ1"        "HDIQ1"        "LifeExpQ1"    "BedsQ1"      
 [6] "UCIQ1"        "HrthAttcksQ1" "DiabetesQ1"   "Deaths.NQ1"   "People.VQ1"  
[11] "HospQ1"       "Cases.NQ1"    "GDPQ2"        "HDIQ2"        "LifeExpQ2"   
[16] "BedsQ2"       "UCIQ2"        "HrthAttcksQ2" "DiabetesQ2"   "Deaths.NQ2"  
[21] "People.VQ2"   "HospQ2"       "Cases.NQ2"    "GDPQ3"        "HDIQ3"       
[26] "LifeExpQ3"    "BedsQ3"       "UCIQ3"        "HrthAttcksQ3" "DiabetesQ3"  
[31] "Deaths.NQ3"   "People.VQ3"   "HospQ3"       "Cases.NQ3"    "GDPQ4"       
[36] "HDIQ4"        "LifeExpQ4"    "BedsQ4"       "UCIQ4"        "HrthAttcksQ4"
[41] "DiabetesQ4"   "Deaths.NQ4"   "People.VQ4"   "HospQ4"       "Cases.NQ4"   

   chisq       df   pvalue    rmsea 
683.6484 111.0000   0.0000   0.3839 
lavaan 0.6-10 ended normally after 308 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        25
                                                      
                                                  Used       Total
  Number of observations                            35         238
                                                                  
Model Test User Model:
                                                      
  Test statistic                               683.648
  Degrees of freedom                               111
  P-value (Chi-square)                           0.000

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate   Std.Err  z-value  P(>|z|)
  Ca =~                                                
    Cases.NQ1          1.000                           
    Cases.NQ2          1.000                           
    Cases.NQ3          1.000                           
    Cases.NQ4          1.000                           
  Hos =~                                               
    HospQ1             1.000                           
    HospQ2             1.000                           
    HospQ3             1.000                           
    HospQ4             1.000                           
  Dea =~                                               
    Deaths.NQ1         1.000                           
    Deaths.NQ2         1.000                           
    Deaths.NQ3         1.000                           
    Deaths.NQ4         1.000                           
  Vac =~                                               
    People.VQ1         1.000                           
    People.VQ2         1.000                           
    People.VQ3         1.000                           
    People.VQ4         1.000                           

Regressions:
                   Estimate   Std.Err   z-value  P(>|z|)
  Dea ~                                                 
    Ca                 0.001     0.004    0.312    0.755
    Hos               -0.024     1.846   -0.013    0.989
    Vac               -2.231   116.084   -0.019    0.985
  Hos ~                                                 
    Vac              -63.462    35.089   -1.809    0.071

Covariances:
                   Estimate   Std.Err   z-value  P(>|z|)
  Ca ~~                                                 
    Vac              -82.771    58.901   -1.405    0.160

Variances:
                   Estimate   Std.Err   z-value  P(>|z|)
   .Cases.NQ1      34767.876  8681.043    4.005    0.000
   .Cases.NQ2       3439.116  1749.604    1.966    0.049
   .Cases.NQ3      20599.345  5310.638    3.879    0.000
   .Cases.NQ4      62065.619 15196.692    4.084    0.000
   .HospQ1         13015.753  3425.370    3.800    0.000
   .HospQ2          3096.670  1253.704    2.470    0.014
   .HospQ3         19230.912  4898.543    3.926    0.000
   .HospQ4          8727.017  2421.593    3.604    0.000
   .Deaths.NQ1        13.016     3.073    4.235    0.000
   .Deaths.NQ2         2.318     0.593    3.906    0.000
   .Deaths.NQ3         3.020     0.745    4.054    0.000
   .Deaths.NQ4         6.715     1.588    4.230    0.000
   .People.VQ1        23.888     6.443    3.708    0.000
   .People.VQ2       101.341    24.458    4.143    0.000
   .People.VQ3       134.993    32.463    4.158    0.000
   .People.VQ4       118.692    28.583    4.153    0.000
    Ca              4447.226  1961.749    2.267    0.023
   .Hos             -226.655 11878.744   -0.019    0.985
   .Dea               -0.136    14.860   -0.009    0.993
    Vac                5.998     4.543    1.320    0.187

In this model we see strong correlations between the latent variables Cases, Vaccinations, deaths and Hospitalization and their respective indicators of each quarter. Furthermore, we see that Vaccination has high negative beta regression coefficient (-3.30) against Deaths. This means that as expected the more vaccinations are executed the less deaths are expected. Also vaccinations have high negative beta regression coefficient (-1) with hospitalizations. Meaning vaccines are helping to decrease severe cases. Another feature is that Cases present a positive beta regression coefficient with deaths, meaning the more cases, more deaths are expected.

Finally this model explains, with a high negative beta regression coefficient, that hospitalization decreases deaths. Principal assumption is that the higher the hospitalizations the less deaths will be produced, therefore countries should make high investments in hospitals to diminish covid severity.

The disturbance Terms for Cases are 0.89, 0.44, 0.82 and 0.93 for Q1,Q2,Q3 & Q4, meaning there 11%,56%, 18% and 7% of the variances of each quarter is caused by variables not controlled. Disturbance are low probably because covid increases similarly among similar countries, therefore there are not many external variables that explain variations in new cases.

The disturbance Terms for Vaccination are 0.8, 0.94, 0.96 and 0.95 for Q1,Q2,Q3 & Q4, meaning there 20%,16%, 4% and 5% of the variances of each quarter is caused by variables not controlled. The low disturbance terms is satisfactory for the model, for instance the amount of vaccination per millions applied for covid should not be caused by many external variables rather than the country. which in this case vaccines were distributed in the world considering the population of each country.

However, Disturbance terms for Hospitalization seems to be higher, this is because not only hospitals capacity is very different from country to country but health investment and life style.

This model presents some issues, the P value very low and the degrees of freedom are high, meaning it is not reliable. Further investigation needs to be done in order to find a more suitable model. Which is why modification indices were driven to find a better fit.

It also important noticing that there are a lot of missing considering that some countries are not very accurate on their reporting abilities.

 [1] "Country"      "GDPQ1"        "HDIQ1"        "LifeExpQ1"    "BedsQ1"      
 [6] "UCIQ1"        "HrthAttcksQ1" "DiabetesQ1"   "Deaths.NQ1"   "People.VQ1"  
[11] "HospQ1"       "Cases.NQ1"    "GDPQ2"        "HDIQ2"        "LifeExpQ2"   
[16] "BedsQ2"       "UCIQ2"        "HrthAttcksQ2" "DiabetesQ2"   "Deaths.NQ2"  
[21] "People.VQ2"   "HospQ2"       "Cases.NQ2"    "GDPQ3"        "HDIQ3"       
[26] "LifeExpQ3"    "BedsQ3"       "UCIQ3"        "HrthAttcksQ3" "DiabetesQ3"  
[31] "Deaths.NQ3"   "People.VQ3"   "HospQ3"       "Cases.NQ3"    "GDPQ4"       
[36] "HDIQ4"        "LifeExpQ4"    "BedsQ4"       "UCIQ4"        "HrthAttcksQ4"
[41] "DiabetesQ4"   "Deaths.NQ4"   "People.VQ4"   "HospQ4"       "Cases.NQ4"   
[46] "lc1"          "lc2"          "lc3"          "lc4"          "lh1"         
[51] "lh2"          "lh3"          "lh4"          "ld1"          "ld2"         
[56] "ld3"          "ld4"          "lv1"          "lv2"          "lv3"         
[61] "lv4"         
 [1] "lc1"    "lc2"    "lc3"    "lc4"    "lv1"    "lv2"    "lv3"    "lv4"   
 [9] "GDPQ1"  "BedsQ1"

lavaan 0.6-10 ended normally after 55 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        27
                                                      
                                                  Used       Total
  Number of observations                           237         238
  Number of missing patterns                        16            
                                                                  
Model Test User Model:
                                                      
  Test statistic                                61.903
  Degrees of freedom                                 8
  P-value (Chi-square)                           0.000

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Observed
  Observed information based on                Hessian

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  Ca =~                                               
    lc1               1.000                           
    lc2               0.995    0.061   16.397    0.000
    lc3               1.001    0.089   11.199    0.000
    lc4               1.258    0.108   11.621    0.000
  Vac =~                                              
    lv2               1.000                           
    lv3               1.174    0.066   17.765    0.000
    lv4               0.977    0.080   12.231    0.000

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  Ca ~                                                
    Vac               0.993    0.119    8.366    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
 .lc1 ~~                                              
   .lc2               0.543    0.172    3.152    0.002
 .lc2 ~~                                              
   .lc4              -0.414    0.100   -4.162    0.000
 .lv2 ~~                                              
   .lv4              -0.089    0.050   -1.773    0.076
   .lv3               0.068    0.086    0.799    0.425
 .lc3 ~~                                              
   .lc4               0.090    0.188    0.480    0.631

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .lc1               3.498    0.132   26.595    0.000
   .lc2               3.429    0.122   28.085    0.000
   .lc3               3.890    0.119   32.581    0.000
   .lc4               3.858    0.140   27.539    0.000
   .lv2               1.694    0.085   20.041    0.000
   .lv3               2.724    0.085   31.918    0.000
   .lv4               3.334    0.074   44.827    0.000
   .Ca                0.000                           
    Vac               0.000                           

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .lc1               1.571    0.213    7.365    0.000
   .lc2               1.065    0.182    5.848    0.000
   .lc3               0.897    0.176    5.108    0.000
   .lc4               0.772    0.256    3.016    0.003
   .lv2               0.410    0.128    3.197    0.001
   .lv3               0.023    0.044    0.519    0.603
   .lv4               0.120    0.032    3.702    0.000
   .Ca                1.176    0.193    6.097    0.000
    Vac               1.213    0.197    6.156    0.000

  chisq      df  pvalue   rmsea 
61.9033  8.0000  0.0000  0.1686 
   lhs op rhs     mi    epc sepc.lv sepc.all sepc.nox
50 lc3 ~~ lv4 15.502  0.079   0.079    0.239    0.239
45 lc2 ~~ lv2 11.889 -0.113  -0.113   -0.171   -0.171
49 lc3 ~~ lv3  8.695 -0.046  -0.046   -0.322   -0.322
41 lc1 ~~ lv2  6.323  0.086   0.086    0.107    0.107
53 lc4 ~~ lv4  5.349 -0.057  -0.057   -0.188   -0.188

In contrast to previous models, the above model is not effected by the high numbers of missing data. Previously, the reporting of hospitalizations led us to have very high numbers of missings.

The aggregation of missing data, as shown above, reflects the high level of missingness in quarter 1 of 2021. This is centered around the rate of vaccine roll out around the world and therefore has been left out of this model.

The model shows us the effect of vaccination on cases. As previously mentioned, we did not expect the vaccine roll out to reduce the number of cases and thus the transmission of the virus. This is evident here as the estimate of the regression is 0.993. In context, this is likely to be explained by the Omicron variant. Furthermore, as vaccines were rolled out worldwide, governments were more lenient with restrictions, aware that this would increase the number of cases but confident that as the vaccines took hold the severity of COVID would reduce.

With more reliable data, we could implement the aforementioned model in which latent variables were assigned to assess the severity of COVID and prove this hypothesis. Regrettably, without this data, this explanation will have to suffice.

Whilst we still have some modification indices that are high, this is the limit of where we can introduce correlations between variables and still receive an output including standard errors. In the model, some of the correlations have been frozen, meaning they will not be included in the model but show where the desired correlations would be introduced if possible.

lavaan 0.6-10 ended normally after 32 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        16
                                                      
                                                  Used       Total
  Number of observations                           141         238
                                                                  
Model Test User Model:
                                                      
  Test statistic                                49.567
  Degrees of freedom                                14
  P-value (Chi-square)                           0.000

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  Ca =~                                               
    lc1               1.000                           
    lc2               0.797    0.084    9.441    0.000
    lc3               0.766    0.097    7.923    0.000
    lc4               1.323    0.137    9.684    0.000

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  Ca ~                                                
    lv2               0.315    0.173    1.825    0.068
    lv3               0.260    0.340    0.766    0.443
    lv4              -0.057    0.282   -0.203    0.839
    lbed              0.502    0.118    4.274    0.000
    lgdp              0.238    0.122    1.947    0.051

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
 .lc1 ~~                                              
   .lc2               0.929    0.183    5.085    0.000
 .lc2 ~~                                              
   .lc4              -0.144    0.112   -1.290    0.197
   .lc3               0.304    0.106    2.857    0.004

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .lc1               1.625    0.218    7.445    0.000
   .lc2               1.614    0.223    7.222    0.000
   .lc3               0.920    0.124    7.406    0.000
   .lc4               0.542    0.157    3.447    0.001
   .Ca                0.489    0.120    4.064    0.000

  chisq      df  pvalue   rmsea 
49.5674 14.0000  0.0000  0.1342 
   lhs op rhs    mi    epc sepc.all delta   ncp power decision
34 lc1 ~~ lc4 1.266 -0.203   -0.217   0.1 0.306 0.086      (i)
35 lc3 ~~ lc4 0.672  0.114    0.161   0.1 0.519 0.111      (i)
33 lc1 ~~ lc3 0.129  0.044    0.036   0.1 0.672 0.130      (i)

Based on the above model, we can see the effect that the GDP of the country and the number of hospital beds per thousand has on the number of cases a country has. Focusing on the Path Diagram, we can see that there is a positive effect of both of these variables on the number of cases. Number of Hospital Beds and GDP act as proxy’s for the level of development of the country, suggesting that countries that are more developed had more cases of COVID throughout the period analyzed. Initially this may seem counter-intuitive, however it is more likely to be a reflection of how effectively the countries reported the data.

This model also suffers from missingness, although not to the same extent as the initial model, which suggests that there are reliability issues around how well the GDP and Hospital Beds were reported. As we can see from the aggregation of missingness analysis, the levels of missingness in Hospital Beds and GDP is relatively high.

The modification indices of this model are very low, including the EPC column, which shows how much the model could change if the correlations were included. From the modification indices, we can conclude that the the effect of the rhs on the lhs is insignificant and would not have a great effect on the model if it were to be included.

Summary

In summary, we have identified 3 Hypothesis relating to the effectiveness of the vaccines and proven Hypothesis 3, that increases in vaccinations in the population will not negatively effect the number of cases. Furthermore, we identified a model that looked to incorporate severity of COVID and number of deaths in to the equation to understand the vaccines effect of this. However, this study was hampered by missingness within the data.

An opportunity for further research would be to collect a more complete data set to further analyze the effectiveness of the vaccines on severity of COVID and deaths.